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Abstract 

We give a polynomial time approximation scheme (PTAS) for computing the supremum of 
a Gaussian process. That is, given a finite set of vectors V C M. d , we compute a (1 + e)-factor 
approximation to 1 &x^j\r d [ su 'Pvev K^i-^OI] deterministically in time po\y(d)-\V . Previously, 
only a constant factor deterministic polynomial time approximation algorithm was known due 
to the work of Ding, Lee and Peres [DLP11]. This answers an open question of Lee [LeelO] and 
Ding [Dinll]. 

The study of supremum of Gaussian processes is of considerable importance in probability 
with applications in functional analysis, convex geometry, and in light of the recent breakthrough 
work of Ding, Lee and Peres [DLP11], to random walks on finite graphs. As such our result 
could be of use elsewhere. In particular, combining with the recent work of Ding [Dinll], our 
result yields a PTAS for computing the cover time of bounded degree graphs. Previously, such 
algorithms were known only for trees. 

Along the way, we also give an explicit oblivious linear estimator for semi-norms in Gaussian 
space with optimal query complexity. 

1 Introduction 

The study of supremum of Gaussian processes is a major area of study in probability and functional 
analysis as epitomized by the celebrated majorizing measures theorem of Fernique and Talagrand 
(see [LT91], [Tal05] and references therein). There is by now a rich body of work on obtaining tight 
estimates and characterizations of the supremum of Gaussian processes with several applications 
in analysis [Tal05], convex geometry [Pis99] and more. Recently, in a striking result, Ding, Lee and 
Peres [DLP11] used the theory to resolve the Winkler-Zuckerman blanket time conjectures [WZ96], 
indicating the usefulness of Gaussian processes even for the study of combinatorial problems over 
discrete domains. 

Ding, Lee and Peres [DLP11] used the powerful Dynkin isomorphism theory and majorizing 
measures theory to establish a structural connection between the cover time (and blanket time) 
of a graph G and the supremum of a Gaussian process associated with the Gaussian Free Field 
on G. They then use this connection to resolve the Winkler-Zuckerman blanket time conjectures 
and to obtain the first deterministic polynomial time constant factor approximation algorithm for 
computing the cover time of graphs. This latter result resolves an old open question of Aldous and 
Fill (1994). 

Besides showing the relevance of the study of Gaussian processes to discrete combinatorial 
questions, the work of Ding, Lee and Peres gives evidence that studying Gaussian processes could 
even be an important algorithmic tool; an aspect seldom investigated in the rich literature on 
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Gaussian processes in probability and functional analysis. Here we address the corresponding 
computational question directly, which given the importance of Gaussian processes in probability, 
could be of use elsewhere. In this context, the following question was asked by Lee [LeelO] and 
Ding [Dinll]. 

Question 1.1. For every e > 0, is there a deterministic polynomial time algorithm that given a 
set of vectors v\, . . . ,v m £ computes a (1 + e)-factor approximation to K x ^j^d [supj [(t^X)!] 1 . 

We remark that Lee [LeelO] and [Dinll] actually ask for an approximation to ¥, x ^_j^d[sup i (vi,X}] 
(and not the supremum of the absolute value). However, this formulation results in a some- 
what artificial asymmetry and for most interesting cases these two are essentially equivalent: if 
E x^V<*[supj(vi, X)] = w(maxj ||uj|| 2 ), then [sup 7 \{vi,X)\] = (l+o(l)) ^ x ^Af4 su Pi( v i, x )] 2 - 

We shall overlook this distinction from now on. 

There is a simple randomized algorithm for the problem: sample a few Gaussian vectors and 
output the median supremum value for the sampled vectors. This however, requires 0(d\ogd/e 2 ) 
random bits. Using Talagrand's majorizing measures theorem, Ding, Lee and Peres give a deter- 
ministic polynomial time 0(l)-factor approximation algorithm for the problem. This approach is 
inherently limited to not yield a PTAS as the majorizing measures characterization is bound to 
lose a universal constant factor. Here we give a PTAS for the problem thus resolving the above 
question. 

Theorem 1.2. For every e > 0, there is a deterministic algorithm that given a set of vectors 
v±, . . . ,v m £ M. d , computes a (1 + e) -factor approximation to K XJ _j^-d [supj |(i>j,x)|] in time poly(d) • 

m O(l/e 2 )_ 

Our approach is comparatively simpler than the work of Ding, Lee and Peres, using some 
classical comparison inequalities in convex geometry. 

We explain our result on estimating semi-norms with respect to Gaussian measures mentioned 
in the abstract in Section 2.2. 

We next discuss some applications of our result to computing cover times of graphs as implied 
by the works of Ding, Lee and Peres [DLP11] and Ding [Dinll]. 

1.1 Application to Computing Cover Times of Graphs 

The study of random walks on graphs is an important area of research in probability, algorithm 
design, statistical physics and more. As this is not the main topic of our work, we avoid giving 
formal definitions and refer the readers to [AF], [Lov93] for background information. 

Given a graph G on n- vertices, the cover time, t cov (G), of G is defined as the expected time a 
random walk on G takes to visit all the vertices in G when starting from the worst possible vertex 
in G. Cover time is a fundamental parameter of graphs and is extensively studied. Algorithmically, 
there is a simple randomized algorithm for approximating the cover time - simulate a few trials 
of the random walk on G for poly(n) steps and output the median cover time. However, without 
randomness the problem becomes significantly harder. This was one of the motivations of the work 

1 Throughout, M denotes the univariate Gaussian distribution with mean and variance 1 and for a distribution 
D, X T> denotes a random variable with distribution T>. By a a-factor approximation to a quantity X we mean a 
number p such that p < X < ap. 

2 This follows from standard concentration bounds for supremum of Gaussian processes; we do not elaborate on it 
here as we ignore this issue. 
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of Ding, Lee and Peres [DLP11] who gave the first deterministic constant factor approximation 
algorithm for the problem, improving on an earlier work of Kahn, Kim, Lovasz and Vu [KKLVOO] 
who obtained a deterministic 0((loglogn) 2 )-factor approximation algorithm. For the simpler case 
of trees, Feige and Zeitouni [FZ09] gave a FPTAS. 

Ding, Lee and Peres also conjectured that the cover time of a graph G (satisfying a certain 
reasonable technical condition) is asymptotically equivalent to the supremum of an explicitly defined 
Gaussian process — the Gaussian Free Field on G. However, this conjecture though quite interesting 
on its own, is not enough to give a PTAS for cover time; one still needs a PTAS for computing 
the supremum of the relevant Gaussian process. Our main result provides this missing piece, thus 
removing one of the obstacles in their posited strategy to obtain a PTAS for computing the cover 
time of graphs. Recently, Ding [Dinll] showed the main conjecture of Ding, Lee and Peres to 
be true for bounded-degree graphs and trees. Thus, combining his result (see Theorem 1.1 in 
[Dinll]) with Theorem 1.2 we get a PTAS for computing cover time on bounded degree graphs 
with Thit(G) = o(t cov (G)) 3 . As mentioned earlier, previously, such algorithms were only known for 
trees [FZ09]. 

2 Outline of Algorithm 

The high level idea of our PTAS is as follows. Fix the set of vectors V = {v i, . . . , v m } C M. d and 
e > 0. Without loss of generality suppose that max ue y | ] | a = 1. We first reduce the dimension of V 
by projecting V onto a space of dimension of <9((log m)/e 2 ) a la the classical Johnson-Lindenstrauss 
lemma (JLL). We then give an algorithm that runs in time polynomial in the number of vectors but 
exponential in the underlying dimension. Our analysis relies on two elegant comparison inequalities 
in convex geometry — Slepian's lemma [Sle62] for the first step and Kanter's lemma [Kan77] for the 
second step. We discuss these modular steps below. 

2.1 Dimension Reduction 

We project the set of vectors V C M. to M k for k = 0((logm)/e 2 ) to preserve all pairwise (Eu- 
clidean) distances within a (1 + e)-factor as in the Johnson-Lindenstrauss lemma (JLL). We then 
show that the expected supremum of the projected Gaussian process is within a (1 + e) factor of 
the original value. The intuition is that, the supremum of a Gaussian process, though a global 
property, can be controlled by pairwise correlations between the variables. To quantify this, we 
use Slepian's lemma, that helps us relate the supremum of two Gaussian processes by comparing 
pairwise correlations. Finally, observe that using known derandomizations of JLL, the dimension 
reduction can be done deterministically in time poly(ci, m, 1/e) [EIO02]. 

Thus, to obtain a PTAS it would be enough to have a deterministic algorithm to approximate 
the supremum of a Gaussian process in time exponential in the dimension k = 0((logm)/e 2 ). 
Unfortunately, a naive argument by discretizing the Gaussian measure in M fc leads to a run-time 
of at least k°^; which gives a rnP^° sl ° sm ^ e ' algorithm. This question was recently addressed 
by Dadush and Vempala [DV12], who needed a similar sub-routine for their work on computing 
M-Ellipsoids of convex sets and give a deterministic algorithm with a run-time of (log k) olyk \ Com- 
bining their algorithm with the dimension reduction step gives a deterministic rn ^ losloglogm ^ £ > 

3 The hitting time Thu(G) is denned as the maximum over all pairs of vertices u, v £ G of the expected time for a 
random walk starting at u to reach v. See the discussion in [Dinll] for why this is a reasonable condition. 
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time algorithm for approximating the supremum. We next get rid of this w(l) dependence in the 
exponent. 

2.2 Oblivious Linear Estimators for Semi-Norms 

We in fact, solve a more general problem by constructing an optimal linear estimator for semi-norms 
in Gaussian space. Let tp : R k -> R + be a semi- norm, i.e., (p is homogeneous and satisfies triangle 
inequality. For normalization purposes, we assume that 1 < K x ^j^k[(p(x)] and that the Lipschitz 
constant of tp is at most A; ^ 1 ). Note that the supremum function ipy(x) = sup„ e y 1(^,^)1 satisfies 
these conditions. Our goal will be to compute a (1 + e)-factor approximation to K x ^j^-k[p(x)} in 
time 2°^ k \ 

Theorem 2.1. For every e > 0, there exists a deterministic algorithm running in time (l/e)°( fc ) 
and space poly(fc, 1/e) that computes a (1 + e)-factor approximation to E, x< _j^k [ip(X)] using only 
oracle access to p. 

Our algorithm has the additional property of being an oblivious linear estimator: the set of 
query points does not depend on p and the output is a positive weighted sum of the evaluations 
of p on the query points. Further, the construction is essentially optimal as any such oblivious 
estimator needs to make at least {l/e) n ^ queries (see Section 7). In comparison, the previous 
best bound of Dadush and Vempala [DV12] needed (log k)°^ queries. We also remark that the 
query points of our algorithm are essentially the same as that of Dadush and Vempala, however 
our analysis is quite different and leads to better parameters. 

As in the analysis of the dimension reduction step, our analysis of the oblivious estimator relies 
on a comparison inequality — Kantor's lemma — that allows us to "lift" a simple estimator for the 
univariate case to the multi-dimensional case. 

We first construct a symmetric distribution fionM that has a simple piecewise flat graph and 
sandwiches the one-dimensional Gaussian distribution in the following sense. Let v be a "shrinking" 
of fi defined to be the probability density function (pdf) of (1 — e)x for x <— fx. Then, for every 
symmetric interval /Ci, m(I) < AT(I) < 

Kantor's lemma [Kan77] says that for pdf 's /x, v as above that are in addition unimodal, the 
above relation carries over to the product distributions /i fc , v k : for every symmetric convex set 
K C ]R fc , fi k (K) < N k {K) < v k (K). This last inequality immediately implies that semi- norms 
cannot distinguish between fi k and J\f k : for any semi-norm tp, E /J fe[<^(x)] = (1 ± e)E,j^k[p(x)]. We 
then suitably prune the distribution fi k to have small support and prove Theorem 4.1. 

Our main result, Theorem 1.2, follows by first reducing the dimension as in the previous section 
and applying Theorem 4.1 to the semi-norm tp : M fc — > M + , p(x) = sup, for the projected 

vectors {u\, . . . , u m }. 

3 Dimension Reduction 

The use of JLL type random projections for estimating the supremum comes from the following 
comparison inequality for Gaussian processes. We call a collection of real-valued random variables 
{Xt}teT a Gaussian process if every finite linear combination of the variables has a normal distri- 
bution with mean zero. For a reference to Slepian's lemma we refer the reader to Corollary 3.14 
and the following discussion in [LT91]. 
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Theorem 3.1 (Slepian's Lemma [Sle62]). Let {Xt}teT and \Yt\te_T be two Gaussian processes such 
that for every s,t€T, E[(X S - X t ) 2 ] < E[(Y S - Y t ) 2 }. Then, E[sup t X t ] < E[sup t Y t ]. 

We also need a derandomized version of the Johnson-Lindenstrauss Lemma. 

Theorem 3.2 ([EIO02]). For every e > 0, there exists a deterministic (dm 2 (logm + l/e)°^)- 
time algorithm that given vectors vi , . . . , v m £ R d computes a linear mapping A : R d -»• R k for 
k = 0((log m)/e 2 ) such that for every i,j€ [m], \\vi — Vj\\2 < — A(vj) H2 < (1 + e)||ui — v j\\2- 

Combining the above two theorems immediately implies the following. 

Lemma 3.3. For every e > 0, there exists a deterministic (dm 2 (log m + l/e)°^)-time algorithm 
that given vectors v\, . . . ,v m € M. d computes a linear mapping A : R d — > R k for k = 0((logm)/e 2 ) 
such that 

E [su V \(vi,x)\] < E [sup\{A(vi),y)\]<(l + e) E [sup \(v u x}\}. (3.1) 
x^N d i y^N k i x-t—Af d i 

Proof. Let V = {v±, . . . , v m } U {—v±, . . . , — v m } and let {X v } ve y be the Gaussian process where the 
joint distribution is given by X v = (v,x) for x <— Af d . Then, E^^^d [supj |(t>j,x)|] = Efsup^X,,]. 

Let A : R d ->■ R k be the linear mapping as given by Theorem 3.2 applied to V. Let {Yyj^gy 
be the "projected" Gaussian process with joint distribution given by Y v = (A(v),y) for y M k . 
Then, E y ^ M k [supi | (v t , y) |] = E[sup„ Y v ]. 

Finally, observe that for any u,v £ V, 

E[(X U - X v ) 2 } = \\u - v\\l < \\A(u) - A(v)\\ 2 = E[(Y U - Y v ) 2 } < (1 + e) 2 E[(X U - X v ) 2 }. 

Combining the above inequality with Slepian's lemma Lemma 3.1 applied to the pairs of pro- 
cesses ({X v } v( z V , {Yvjvev) and ({Y v } ve v, {(1 + e)X v } v&v ) it follows that 

E[supX„] < E[supy„] < E[sup(l + e)X v ] = (1 + e)E[supX„]. 

V V V V 

The lemma now follows. □ 

4 Oblivious Estimators for Semi-Norms in Gaussian Space 

In the previous section we reduced the problem of computing the supremum of a (/-dimensional 
Gaussian process to that of a Gaussian process in k = 0((logm)/e 2 )-dimensions. Thus, it suffices 
to have an algorithm for approximating the supremum of Gaussian processes in time exponential 
in the dimension. We will give such an algorithm that works more generally for all semi-norms. 

Let ip : R k ->■ R+ be a semi- norm. That is, <p satisfies the triangle inequality and is homogeneous. 
For normalization purposes we assume that 1 < Ej^k[(p(X)] and the Lipschitz constant of ip is at 
most 

Theorem 4.1. For every e > 0, there exists a set S C R k with \S\ = (\/e)°^ and a function 
p : R k ->• R + computable in poly (A;, 1/e) time such that the following holds. For every semi-norm 
ip : R k -»• R + , 

(! " £ ) ( EpW^) ) < „ \M X ^ ^ ( X + £ ) ( EfW^ 1 ) J • 
\xeS J \xes J 

Moreover, successive elements of S can be enumerated in poly(fe, 1/e) time and 0(k\og(l/e)) space. 
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Theorem 2.1 follows immediately from the above. 

Proof of Theorem 2.1. Follows by enumerating over the set S and computing YlxeS p( x ) < / 7 ( x ) by 
querying (p on the points in S. □ 

We now prove Theorem 4.1. Here and henceforth, let 7 denote the pdf of the standard univariate 
Gaussian distribution. Fix e > and let 5 > be a parameter to be chosen later. Let [i = fi e g be 
the pdf which is a piecewise-flat approximator to 7 obtained by spreading the mass 7 gives to an 
interval / = [id, (i + 1)5) evenly over /. Formally, fi(z) = n{—z) and for z > 0, z S [iS, (i + 1)5), 

_ 7([WW) 


Clearly, defines a symmetric distribution on R. We will show that for 5 <C £ sufficiently small, 
semi- norms cannot distinguish the product distribution from A^ fc : 

Lemma 4.2. Ze£ 5 = (2e) 3 / 2 . Then, for every semi-norm ip : M fc — > K, 

(1-e) E MX)}< \MZ)}< E MX)]. 

X^H k Z*r~N k X^[l k 

We first prove Theorem 4.1 assuming the above lemma, whose proof is deferred to the next 
section. 

Proof of Theorem 4-1. Let ft be the symmetric distribution supported on 5(7L + 1/2) with pdf 
defined by 

£(<*(* + 1/2)) = A*([i*, (i + l)5)), 

for i > 0. Further, let X fi k , X <(— fi k , Z <- 7V rfc . 

We claim that E[</>(X)] = (1 ± e)E[ip(Z)]. Let F be uniformly distributed on [-5, 5] h and 
observe that random variable X = X + Y in law. Therefore, 

E[p(X)] = E[<p(X + Y)] = E[^(X)] ± E[<f(Y)] = E[<p(X)] ±6E[<p(Y/6)] 

= E[<p(X)] ±5 E [tp(Z')] = E[(p(X)] ±8E[tp(Z)] (Lemma 5.7). (4.2) 

Z' 6-u [—1,1] 

Thus, by Lemma 4.2, 

E[<p(X)] = (l±0(e))E[ip(Z)] (4.3) 

We next prune fl k to reduce its support. Define p : M fc — > R + by p(x) = jl k (x). Clearly, p(x) being 
a product distribution is computable in poly (A;, 1/e) time. 

Let S = (5(Z + l/2)) fe n B 2 (3y/k), where B 2 (r) C M fc denotes the Euclidean ball of radius r. 
As <p has Lipschitz constant bounded by k°^\ a simple calculation shows that throwing away all 
points in the support of X outside S does not change E[^(X)] much. It is easy to check that for 
x </ S, p(x) < exp(-||x||2/4)/(27r) fc / 2 . Therefore, 



E[ip(X)} = ^2p(x)<p(x) = ^2p(x)<p(x) + ^p(x)Lp(x) 



5>(*M*) ± £ 6XP L '5 2 2/4) • (* 0(1) IM| 2 ) = £p(*M*) ± (4-4) 
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\x€S 



0(k). 



From Equation 4.3 and the above equation we get (recall that E[</?(Z)] > 1) 

E[ i p(Z)] = (l±0(e))(j2p(xMx)), 

which is what we want to show. 

We now reason about the complexity of S. First, by a simple covering argument \S\ < (1/5) 

Vol(B 2 (3Vk) + [-S,5] k ) _ 0{k) _ 0{k) 
lbl< Vol([-5,5] k ) ~ [1/d) - [1/£) ' 

where for sets A, B C. M. k , A+B denotes the Minkowski sum and Vol denotes Lebesgue volume. This 
size bound almost suffices to prove Theorem 4.1 except for the complexity of enumerating elements 
from S. Without loss of generality assume that R = 2>^/n/5 is an integer. Then, enumerating 
elements in S is equivalent to enumerating integer points in the n-dimensional ball of radius R. 
This can be accomplished by going through the set of lattice points in the natural lexicographic 
order, and takes poly(/c, 1/e) time and 0(k log(l/e)) space per point in S. □ 



5 Proof of Lemma 4.2 

Our starting point is the following definition that helps us compare multivariate distributions when 
we are only interested in volumes of convex sets. We shall follow the notation of [Bal03]. 

Definition 5.1. Given two symmetric pdf's, f,g on R fc , we say that f is less peaked than g (f ^ g) 
if for every symmetric convex set K C R fc , f(K) < g(K). 

We also need the following elementary facts. The first follows from the unimodality of the 
Gaussian density and the second from partial integration. 

Fact 5.2. For any 5 > and [i as defined by Equation 4-1, ^ is less peaked than 7. 

Fact 5.3. Let f,g be distributions on M. k with f ^ g. Then for any semi-norm if : M. k — > R, 
E f [<p(x)]>E g [<p(x)]. 

Proof. Observe that for any t > 0, {x : f(x) < t} is convex. Let random variables X <— f , Y <— g. 
Then, by partial integration, E[<p(X)] = ip'(t) Pr[<p(X) > t]dt > J °° ip'(t) Pr[<p(Y) > t]dt = 
E[p(Y)]. □ 

The above statements give us a way to compare the expectations of fi and 7 for one-dimensional 
convex functions. We would now like to do a similar comparison for the product distributions /j, k 
and 7 fc . For this we use Kanter's lemma [Kan77], which says that the relation ^ is preserved under 
tensoring if the individual distributions have the additional property of being unimodal. 

Definition 5.4. A distribution f on M n is unimodal if f can be written as an increasing limit of a 
sequence of distributions each of which is a finite positively weighted sum of uniform distributions 
on symmetric convex sets. 

Theorem 5.5 (Kanter's Lemma [Kan77]; cf. [Bal03]). Let /^i,/i2 ^ e symmetric distributions on R™ 
with Hi Hi and let v be a unimodal distribution on R m . Then, the product distributions /ii x v, 
fi2 x v on W 1 x R m satisfy fi± x v -< [x^ x v. 
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We next show that p "sandwiches" 7 in the following sense. 
Lemma 5.6. Let v be the pdf of the random variable y = (1 — e)x for x p. Then, for 5 < (2e) 3 / 2 , 
^ 7 - 

Proof. As mentioned above, p ;< 7. We next show that 7 H za Intuitively, v is obtained by 
spreading the mass that 7 puts on an interval / = [i5, (i + 1)5) evenly on the smaller interval 
(1 — e)I. The net effect of this operation is to push the pdf of p closer towards the origin and for 
5 sufficiently small the inward push from this "shrinking" wins over the outward push from going 
to p. 

Fix an interval I = [-i5(l - e) - 9, i5{l - e) + 9] for < 9 < 5(1 - e). Then, 

u(I) = v ( \-%5(l - e),i6(l -e)]) + 2v{ [i5(l - e),%S{\ - e) + 9] ) (5.1) 

2fl -7(^,(1 + !)<*)) 
= 7( -*$,*> ) + j7z -t • (5.2) 

We now consider two cases. 

Case 1: i > (1 — e)/e so that i<5(l — e) + 9 < i5. Then, from the above equation, 

v{T) > 7 ( [-it, i5] ) > 7 ( - e) - 9, i5(l -e) + 9]) = j(I). 

Case 2: t < (1 - e)/e. Let a = (i + 1)5 = 5/e. Then, as 1 - x 2 /2 < e"* 2 / 2 < 1, 

7((i(J, i<5 + 9]) < 9 ■ 7(0), 7 ( [i<y, (i + 1)5) )>5- 7(0) • (1 - a 2 /2). 

Therefore, 

29 ■ 7 ( [i5, (i + 1)5) ) 



v(I)= 7 (I)-2 7 ((i5,i5(l-e) + 9]) 
>-y(I)-2j((i5,i5 + 9}) + 
> 7(1) - 20 7 (O) + 



5(1 -e) 
29 • 7 ( [iS, (i + 1)5) ) 
5(1 -e) 

29 -5 -^(0) • (l-a 2 /2) 



5(1 -e) 



= 7(/) + ^-(,-« 2 /2)>7(D, 

for a 2 < 2e, i.e., if <5 < (2e) 3 / 2 . □ 

Lemma 4.2 follows easily from the above two claims. 

Proof of Lemma 4. 2. Clearly, p, v, 7 are unimodal and product of unimodal distributions is uni- 
modal. Thus, from the above lemma and iteratively applying Kanter's lemma we get p k -< "y k -< v k . 
Therefore, by Fact 5.3, for any semi-norm (p, 

EUX)] > E[<p(Y)\ > KMX)] = E[#-s)X)] = (1-e) E[<p(X)]. 

fi k 7 fc v k fi fi k 

□ 

We now prove the auxiliary lemma we used in proof of Theorem 4.1. 

Lemma 5.7. Let p be the uniform distribution on [—1,1]. Then, 7 ^ p and for any semi-norm 
if : R k -> R, E pk [p(x)} < E^[ip(x)]. 

Proof. It is easy to check that 7 ^ p. Then, by Kanter's lemma j k ^ p k and the inequality follows 
from Fact 5.3. □ 



S 



6 A PTAS for Supremum of Gaussian Processes 



Our main theorem, Theorem 1.2, follows immediately from Lemma 3.3 and Theorem 2.1 applied 
to the semi-norm ip : R fc — > R defined by <p(x) = sup i<m \(A(vi), x}\. 

7 Lowerbound for Oblivious Estimators 

We now show that Theorem 4.1 is optimal: any oblivious linear estimator for semi- norms as in the 
theorem must make at least (C/e) k queries for some constant C > 0. 

Let S C R fc be the set of query points of an oblivious estimator. That is, there exists a function 
/ : Rf -> R + such that for any semi-norm <p : R fc -> R+, /((^(a;) : x G 5)) = (l±e)E F ^ fc [^(Y)]. 
We will assume that / is monotone in the following sense: f(xi, . . . , x\s\) — f(yi:---^y\s\) if 
< Xi < yi for all i. This is clearly true for any linear estimator (and also for the median 
estimator). Without loss of generality suppose that e < 1/4. 

The idea is to define a suitable semi-norm based on S: define 99 : R fe — > M. by <p(x) = 
sup u&s \(u/\\u\\2,x)\. It is easy to check that for any v G S, \\v\\2 < <p(v). Therefore, the out- 
put of the oblivious estimator when querying the Euclidean norm is at most the output of the 
estimator when querying ip. In particular, 

(1-e) E [\\Y\\ 2 }<f((\\x\\ 2 :xeS))<f(&(x):xeS))<(l + e) E MY)]. (7.1) 
Y<-Af k Y<—Af k 

We will argue that the above is possible only if |5| > (C /e) k . Let S k ~ l denote the unit sphere in R fc . 
For the remaining argument, we shall view Y <— J\f k to be drawn as Y = RX, where X G S k ~ 1 is 
uniformly random on the sphere and R G R is independent of X and has a Chi-squared distribution 
with k degrees of freedom. Let 5(e) = U ueS {y G : \{u/\\u\\ 2 ,y)\ > 1 - 4e}. 

Now, by a standard volume argument, for any y G S k ~ l , Prx[\(X,y)\ > 1 — 4e] < (0(e)) k . 
Thus, by a union bound, p = Prx[X G S(e)] < \S\ ■ (0(e)) k . Further, for any y G 5 fc_1 \ S(e), 
<p(y) < 1 — 4e. Therefore, 

E[p(X)] = Pr[X i S(e)] ■ E[<p(X)\X £ S(e)] + Pr[X G S(e)] ■ E[<p(X)\X G S(e)] < 

(l-p)(l-4e)+p. 

Thus, 

E[<p(Y)] = E[p(RX)] = E[R] • E[tp(X)] < E[||y|| 2 ] • ((1 - p){\ - As) +p). (7.2) 
Combining Equations 7.1 and 7.2, we get 

1 -e < (1 + e) • ((l-p)(l-4e) +p) < 1 - 3e + 2p. 

As p < \S\ ■ (0(e)) k , the above leads to a contradiction unless \S\ > {C/e) k for some constant 
C > 0. 
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